Search CORE

66 research outputs found

High-resolution sinusoidal analysis for resolving harmonic collisions in music audio signal processing

Author: Ehmann Andreas
Publication venue
Publication date: 01/12/2011
Field of study

Many music signals can largely be considered an additive combination of multiple sources, such as musical instruments or voice. If the musical sources are pitched instruments, the spectra they produce are predominantly harmonic, and are thus well suited to an additive sinusoidal model. However, due to resolution limits inherent in time-frequency analyses, when the harmonics of multiple sources occupy equivalent time-frequency regions, their individual properties are additively combined in the time-frequency representation of the mixed signal. Any such time-frequency point in a mixture where multiple harmonics overlap produces a single observation from which the contributions owed to each of the individual harmonics cannot be trivially deduced. These overlaps are referred to as overlapping partials or harmonic collisions. If one wishes to infer some information about individual sources in music mixtures, the information carried in regions where collided harmonics exist becomes unreliable due to interference from other sources. This interference has ramifications in a variety of music signal processing applications such as multiple fundamental frequency estimation, source separation, and instrumentation identification. This thesis addresses harmonic collisions in music signal processing applications. As a solution to the harmonic collision problem, a class of signal subspace-based high-resolution sinusoidal parameter estimators is explored. Specifically, the direct matrix pencil method, or equivalently, the Estimation of Signal Parameters via Rotational Invariance Techniques (ESPRIT) method, is used with the goal of producing estimates of the salient parameters of individual harmonics that occupy equivalent time-frequency regions. This estimation method is adapted here to be applicable to time-varying signals such as musical audio. While high-resolution methods have been previously explored in the context of music signal processing, previous work has not addressed whether or not such methods truly produce high-resolution sinusoidal parameter estimates in real-world music audio signals. Therefore, this thesis answers the question of whether high-resolution sinusoidal parameter estimators are really high-resolution for real music signals. This work directly explores the capabilities of this form of sinusoidal parameter estimation to resolve collided harmonics. The capabilities of this analysis method are also explored in the context of music signal processing applications. Potential benefits of high-resolution sinusoidal analysis are examined in experiments involving multiple fundamental frequency estimation and audio source separation. This work shows that there are indeed benefits to high-resolution sinusoidal analysis in music signal processing applications, especially when compared to methods that produce sinusoidal parameter estimates based on more traditional time-frequency representations. The benefits of this form of sinusoidal analysis are made most evident in multiple fundamental frequency estimation applications, where substantial performance gains are seen. High-resolution analysis in the context of computational auditory scene analysis-based source separation shows similar performance to existing comparable methods

Illinois Digital Environment for Access to Learning and Scholarship Repository

BAW-Brief Nr. 1 – Januar 2008

Author: Ehmann Rainer
Westendarp Andreas
Publication venue: Bundesanstalt für Wasserbau
Publication date: 01/01/2008
Field of study

595-B, Bautechnik, Neue Bemessungsregeln für Hänger an Stabbogenbrücken594-B, Bautechnik, BAW-Merkblatt Zweitbeto

Hydraulic Engineering Repository

Contrastive Learning for Cross-modal Artist Retrieval

Author: Ehmann Andreas
Ferraro Andres
Gouyon Fabien
Kim Jaehun
Oramas Sergio
Publication venue
Publication date: 12/08/2023
Field of study

Music retrieval and recommendation applications often rely on content features encoded as embeddings, which provide vector representations of items in a music dataset. Numerous complementary embeddings can be derived from processing items originally represented in several modalities, e.g., audio signals, user interaction data, or editorial data. However, data of any given modality might not be available for all items in any music dataset. In this work, we propose a method based on contrastive learning to combine embeddings from multiple modalities and explore the impact of the presence or absence of embeddings from diverse modalities in an artist similarity task. Experiments on two datasets suggest that our contrastive method outperforms single-modality embeddings and baseline algorithms for combining modalities, both in terms of artist retrieval accuracy and coverage. Improvements with respect to other methods are particularly significant for less popular query artists. We demonstrate our method successfully combines complementary information from diverse modalities, and is more robust to missing modality data (i.e., it better handles the retrieval of artists with different modality embeddings than the query artist's)

arXiv.org e-Print Archive

Supervised and Unsupervised Learning of Audio Representations for Music Understanding

Author: Ehmann Andreas F.
Gouyon Fabien
Korzeniowski Filip
McCallum Matthew C.
Oramas Sergio
Publication venue
Publication date: 07/10/2022
Field of study

In this work, we provide a broad comparative analysis of strategies for pre-training audio understanding models for several tasks in the music domain, including labelling of genre, era, origin, mood, instrumentation, key, pitch, vocal characteristics, tempo and sonority. Specifically, we explore how the domain of pre-training datasets (music or generic audio) and the pre-training methodology (supervised or unsupervised) affects the adequacy of the resulting audio embeddings for downstream tasks. We show that models trained via supervised learning on large-scale expert-annotated music datasets achieve state-of-the-art performance in a wide range of music labelling tasks, each with novel content and vocabularies. This can be done in an efficient manner with models containing less than 100 million parameters that require no fine-tuning or reparameterization for downstream tasks, making this approach practical for industry-scale audio catalogs. Within the class of unsupervised learning strategies, we show that the domain of the training dataset can significantly impact the performance of representations learned by the model. We find that restricting the domain of the pre-training dataset to music allows for training with smaller batch sizes while achieving state-of-the-art in unsupervised learning -- and in some cases, supervised learning -- for music understanding. We also corroborate that, while achieving state-of-the-art performance on many tasks, supervised learning can cause models to specialize to the supervised information provided, somewhat compromising a model's generality

arXiv.org e-Print Archive

Recommended from our members

Melody Transcription From Music Audio: Approaches and Evaluation

Author: Ehmann Andreas F.
Ellis Daniel P. W.
Gomez Emilia
Ong Beesuan
Poliner Graham E.
Streich Sebastian
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2007
Field of study

Although the process of analyzing an audio recording of a music performance is complex and difficult even for a human listener, there are limited forms of information that may be tractably extracted and yet still enable interesting applications. We discuss melody--roughly, the part a listener might whistle or hum--as one such reduced descriptor of music audio, and consider how to define it, and what use it might be. We go on to describe the results of full-scale evaluations of melody transcription systems conducted in 2004 and 2005, including an overview of the systems submitted, details of how the evaluations were conducted, and a discussion of the results. For our definition of melody, current systems can achieve around 70% correct transcription at the frame level, including distinguishing between the presence or absence of the melody. Melodies transcribed at this level are readily recognizable, and show promise for practical applications

Columbia University Academic Commons

Melody Transcription From Music Audio: Approaches and Evaluation

Author: Andreas F. Ehmann
Beesuan Ong
Daniel P. W. Ellis
Emilia Gomez
Graham E. Poliner
Sebastian Streich
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Excitations of single-beauty hadrons

Author: A. V. Manohar
Andreas Schäfer
B. Blossier
C. Bernard
C. Bernard
C. Ehmann
C. McNeile
Christian B. Lang
Christian Hagen
G. Curci
H. Na
J. Koponen
M. Della Morte
M. Della Morte
Markus Limmer
R. Frigori
T. Burch
T. Burch
Tommy Burch
Publication venue: 'American Physical Society (APS)'
Publication date: 18/12/2008
Field of study

In this work we study the predominantly orbital and radial excitations of hadrons containing a single heavy quark. We present meson and baryon mass splittings and ratios of meson decay constants (e.g.,

f_{B_s}/f_B

and

f_{B_s'}/f_{B_s}

) resulting from quenched and dynamical two-flavor configurations. Light quarks are simulated using the chirally improved (CI) lattice Dirac operator at valence masses as light as

M_\pi \approx 350

MeV. The heavy quark is approximated by a static propagator, appropriate for the

b

quark on our lattices (

1/a \sim 1-2

GeV). We also include some preliminary calculations of the

O(1/m_Q^{})

kinetic corrections to the states, showing, in the process, a viable way of applying the variational method to three-point functions involving excited states. We compare our results with recent experimental findings.Comment: 23 pages, 18 figures, 17 tables; slight title change (Ed. killjoy); reference added; version to appear in Phys Rev

arXiv.org e-Print Archive

Crossref

Dissolution Testing of Hardly Soluble Materials by Surface Sensitive Techniques: Clotrimazole from an Insoluble Matrix

Author: A Doliska
A Lafaurie
Andreas Zimmer
AW Newman
B Eerdenbrugh Van
B Prabagar
C Amorosi
D Chen
D Liua
E Roblegg
E Roblegg
G Ozaydin-Ince
G Sauerbrey
G Wypych
H Kiessig
Heike M. A. Ehmann
HM Ehmann
JA Forrest
K Ariga
K Tawa
L Nevot
LG Parratt
M Rodahl
M Taniguchi
MAC Stuart
MS Lamm
N Blagden
O Werzer
O Werzer
O Werzer
OKC Tsui
Oliver Werzer
RE Gordon
Roman Keimel
Sascha Winter
Simone Schrank
T Mohan
Thomas Griesser
ZH Yang
ZH Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref